An Efficient Algorithm To Induce Minimum Average Lookahead Grammars For Incremental LR Parsing
نویسندگان
چکیده
We define a new learning task, minimum average lookahead grammar induction, with strong potential implications for incremental parsing in NLP and cognitive models. Our thesis is that a suitable learning bias for grammar induction is to minimize the degree of lookahead required, on the underlying tenet that language evolution drove grammars to be efficiently parsable in incremental fashion. The input to the task is an unannotated corpus, plus a nondeterministic constraining grammar that serves as an abstract model of environmental constraints confirming or rejecting potential parses. The constraining grammar typically allows ambiguity and is itself poorly suited for an incremental parsing model, since it gives rise to a high degree of nondeterminism in parsing. The learning task, then, is to induce a deterministic LR (k) grammar under which it is possible to incrementally construct one of the correct parses for each sentence in the corpus, such that the average degree of lookahead needed to do so is minimized. This is a significantly more difficult optimization problem than merely compiling LR (k) grammars, since k is not specified in advance. Clearly, naı̈ve approaches to this optimization can easily be computationally infeasible. However, by making combined use of GLR ancestor tables and incremental LR table construction methods, we obtain an O(n3 + 2m) greedy approximation algorithm for this task that is quite efficient in practice.
منابع مشابه
Symbolic Lookaheads for Bottom-up Parsing
We present algorithms for the construction of LALR(1) parsing tables, and of LR(1) parsing tables of reduced size. We first define specialized characteristic automata whose states are parametric w.r.t. variables symbolically representing lookahead-sets. The propagation flow of lookaheads is kept in the form of a system of recursive equations, which is resolved to obtain the concrete LALR(1) tab...
متن کاملIncremental Parser Generation for Tree Adjoining Grammars
This paper describes the incremental generation of parse tables for the LRtype parsing of Tree Adjoining Languages (TALs). The algorithm presented handles modi cations to the input grammar by updating the parser generated so far. In this paper, a lazy generation of LR-type parsers for TALs is de ned in which parse tables are created by need while parsing. We then describe an incremental parser ...
متن کاملDeterministic Left to Right Parsing of Tree Adjoining Languages
We define a set of deterministic bottom-up left to right parsers which analyze a subset of Tree Adjoining Languages. The LR parsing strategy for Context Free Grammars is extended to Tree Adjoining Grammars (TAGs). We use a machine, called Bottom-up Embedtied Push Down Automaton (BEPDA), that recognizes in a bottom-up fashion the set of Tree Adjoining Languages (and exactly this se0. Each parser...
متن کاملAn Efficient Augmented-Context-Free Parsing Algorithm
An efficient parsing algorithm for augmented context-free grammars is introduced, and its application to on-line natural language interfaces discussed. The algorithm is a generalized LR parsing algorithm, which precomputes an LR shift-reduce parsing table (possibly with multiple entries) from a given augmented context-free grammar. Unlike the standard LR parsing algorithm, it can handle arbitra...
متن کاملConstruction of Efficient Generalized LR Parsers
We show how LR parsers for the analysis of arbitrary contextfree grammars can be derived from classical Earley’s parsing algorithm. The result is a Generalized LR parsing algorithm working at complexity O(n) in the worst case, which is achieved by the use of dynamic programming to represent the non-deterministic evolution of the stack instead of graph-structured stack representations, as has of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004